-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add multiple GPU support #760
feat: Add multiple GPU support #760
Conversation
Add support to manage multiple GPU's similar to CPU's - Rename gpu to gpu memory across the board to make way for gpu min and max values. - Rename mem to memory to more descriptive. - Make gpu memory a proper value in the host reports not and additional attribute. - Add setting and updating GPU and GPU memory counts via the API. - GPU list is given to frames in RQD using the CUE_GPU_CORES env variable. Missing from the MR. 1) for simplicity I modified the Initial migration to incorporate all the changed needed in both the tables, functions and triggers. To keep backward compatibility for users it will need to make it into a migration. 2) Our cuegui and rqd have diverged too much for easy merge. I've ported what I can, but it will likely be missing elements. 3) We don't use windows, the GPU RQD side uses nvidia-smi directly. we will want to find an OS-agnostic method. 4) tests. we will definitely need to write some tests.
babeb19
to
cc2585b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Lars, first of all sorry for leaving this for so long.
I've reviewed it, and overall it looks good. I have some minor comments here, and there are some conflicts that need to be resolved, but if we can clear these up I think we can wrap this one up quickly.
@@ -17,6 +17,7 @@ CREATE TABLE frame_history ( | |||
int_mem_reserved BIGINT DEFAULT 0 NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will definitely want to do this as a new migration instead of updating an existing migration.
That will also necessitate a minor version bump.
@@ -29,15 +29,15 @@ | |||
cores NMTOKEN #REQUIRED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, this should become a new 1.10
version instead of modifying the existing version.
@larsbijl @bcipriano Do you mind if I help with this pull request? We'd like to use similar 8GPUs machines. |
No objections here! We just discussed this at our last TSC meeting as a major item that would be great to wrap up ASAP. |
By all means! please go for it :) |
- Original AcademySoftwareFoundation#760 V1__Initial_schema.sql - Whole file https://gist.github.com/splhack/dd51d30723a5b1ee20c599290d85e25a - Diff https://gist.github.com/splhack/fab852153505c65742909917888c51bc - Changes from AcademySoftwareFoundation#760 - Use BIGINT for GPU memory - Clean up unrelated changes - Create new indexes - V1 -> V10 view/function changes - Diff https://gist.github.com/splhack/2e52913d8e52e6dd4dd77e26ff8209fa - Edited V1 for compare https://gist.github.com/splhack/7ceb1f75745dd64f0ef4c0d907f0cb36 Co-authored-by: Lars van der Bijl <[email protected]>
- Original AcademySoftwareFoundation#760 V1__Initial_schema.sql - Whole file https://gist.github.com/splhack/dd51d30723a5b1ee20c599290d85e25a - Diff https://gist.github.com/splhack/fab852153505c65742909917888c51bc - Changes from AcademySoftwareFoundation#760 - Use BIGINT for GPU memory - Clean up unrelated changes - Create new indexes - V1 -> V10 view/function changes - Diff https://gist.github.com/splhack/2e52913d8e52e6dd4dd77e26ff8209fa - Edited V1 for compare https://gist.github.com/splhack/7ceb1f75745dd64f0ef4c0d907f0cb36 Co-authored-by: Lars van der Bijl <[email protected]>
- Original AcademySoftwareFoundation#760 V1__Initial_schema.sql - Whole file https://gist.github.com/splhack/dd51d30723a5b1ee20c599290d85e25a - Diff https://gist.github.com/splhack/fab852153505c65742909917888c51bc - Changes from AcademySoftwareFoundation#760 - Use BIGINT for GPU memory - Clean up unrelated changes - Create new indexes - V1 -> V10 view/function changes - Diff https://gist.github.com/splhack/2e52913d8e52e6dd4dd77e26ff8209fa - Edited V1 for compare https://gist.github.com/splhack/7ceb1f75745dd64f0ef4c0d907f0cb36 Co-authored-by: Lars van der Bijl <[email protected]>
@larsbijl @bcipriano #924 is ready for code review. I may force push it again if I find something though. |
#924 is merged, we can close this 🙂 |
…columns - Fix the column indexing on the "addColumn" of class CueJobMonitorTree. - This bug was introduced after the merge from the pull request "Add multiple GPU support AcademySoftwareFoundation#760 (AcademySoftwareFoundation#924)" on 4/18/22 at 11:45 AM where the following new columns were introduced on the CueJobMonitorTree: "Gpus", "Min Gpus", "Max Gpus", "MaxGpuMem" and the indexing of the columns were wrongly defined.
Fix "Monitor Cue" with incorrect column indexing for "Min" and "Max" columns - Fix the column indexing on the "addColumn" of class CueJobMonitorTree. - This bug was introduced after the merge from the pull request "Add multiple GPU support #760 (#924)" on 4/18/22 at 11:45 AM where the following new columns were introduced on the CueJobMonitorTree: "Gpus", "Min Gpus", "Max Gpus", "MaxGpuMem" and the indexing of the columns were wrongly defined.
…dation#1431) Fix "Monitor Cue" with incorrect column indexing for "Min" and "Max" columns - Fix the column indexing on the "addColumn" of class CueJobMonitorTree. - This bug was introduced after the merge from the pull request "Add multiple GPU support AcademySoftwareFoundation#760 (AcademySoftwareFoundation#924)" on 4/18/22 at 11:45 AM where the following new columns were introduced on the CueJobMonitorTree: "Gpus", "Min Gpus", "Max Gpus", "MaxGpuMem" and the indexing of the columns were wrongly defined.
Add support to manage multiple GPU's similar to CPU's
max values.
attribute.
variable.
Missing from the MR.
for simplicity I modified the Initial migration to incorporate all the changed needed in both the tables, functions and triggers.
To keep backward compatibility for users it will need to make it into a migration.
Our cuegui and rqd have diverged too much for easy merge. I've ported what I can, but it will likely be missing elements.
We don't use windows, the GPU RQD side uses nvidia-smi directly. we will want to find an OS-agnostic method.
tests. we will definitely need to write some tests.
Closes #459 #460